Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm
نویسندگان
چکیده
This paper describes a novel feature selection algorithm for unsupervised clustering, that combines the clustering ensembles method and the population based incremental learning algorithm. The main idea of the proposed unsupervised feature selection algorithm is to search for a subset of all features such that the clustering algorithm trained on this feature subset can achieve the most similar clustering solution to the one obtained by an ensemble learning algorithm. In particular, a clustering solution is firstly achieved by a clustering ensembles method, then the population based incremental learning algorithm is adopted to find the feature subset that best fits the obtained clustering solution. One advantage of the proposed unsupervised feature selection algorithm is that it is dimensionality-unbiased. In addition, the proposed unsupervised feature selection algorithm leverages the consensus across multiple clustering solutions. Experimental results on several real data sets demonstrate that the proposed unsupervised feature selection algorithm is often able to obtain a better feature subset when compared with other existing unsupervised feature selection algorithms.
منابع مشابه
High-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملClustering based on Dirichlet mixtures of attribute ensembles
We propose a model-based approach to identifying clusters of objects based on subsets of attributes, so that the attributes that distinguish a cluster from the rest of the population, called an attribute ensemble, may depend on the cluster being considered. The model is based on a Pólya urn cluster model, which is equivalent to a Dirichlet process mixture of multivariate normal distributions. T...
متن کاملEvolving Ensembles of Feature Subsets towards Optimal Feature Selection for Unsupervised and Semi-supervised Clustering
The work in unsupervised learning centered on clustering has been extended with new paradigms to address the demands raised by real-world problems. In this regard, unsupervised feature selection has been proposed to remove noisy attributes that could mislead the clustering procedures. Additionally, semi-supervision has been integrated within existing paradigms because some background informatio...
متن کاملOnline Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features
Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...
متن کاملFeatures in Concert: Discriminative Feature Selection meets Unsupervised Clustering
Feature selection is an essential problem in computer vision, important for category learning and recognition. Along with the rapid development of a wide variety of visual features and classifiers, there is a growing need for efficient feature selection and combination methods, to construct powerful classifiers for more complex and higherlevel recognition tasks. We propose an algorithm that eff...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pattern Recognition
دوره 41 شماره
صفحات -
تاریخ انتشار 2008